Skip to content

Introduce expression/type formatting functor #2192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

kroening
Copy link
Member

The interface comes with an exemplar of an application (show_symbol_table).
Note that this can replace util/format_expr.h and util/format_type.h.

This is extensible, i.e., it is possible to make the debug_formatter treat particular cases differently; it is possible to pass further data along with the formatter, say a symbol table, a wide space policy, etc., if needed. The formatter can be stateful (say for indents, parentheses, etc.).

@tautschnig
Copy link
Collaborator

So what about the language-specific output? Is it a sound assumption that all symbols in the symbol table will always belong to a single language?

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in "my" part look OK but I have a bigger question -- how is this going to work with expr2c expr2java, expr2whatever, etc. Without some link to that it seems like just extra wrapping.

@kroening
Copy link
Member Author

An option is to build expr2c (or other languages) by extending this one; the obvious, low-effort step is to add a wrapper.

@kroening
Copy link
Member Author

Language-specific output: The idea is that we use the language of the application: cbmc prints C syntax, jbmc prints java syntax, adabmc prints, ADA, etc. pp
One could contemplate what to do for mixed-language environments.

@tautschnig
Copy link
Collaborator

  1. How about not defining a global, static symbol, and instead forcing each front-end to define it? Then at least the front-end has to make a conscious choice.
  2. goto-diff and goto-instrument will need some extra work, because they should at times generate language-specific output but don't have a fixed source language.

@kroening
Copy link
Member Author

  1. Yes, the idea is that every frontend provides one of these. The one in formatter is really meant for debugging only, and it's hoped that the name makes this clear.

  2. Yes, these do make the case for a formatter factory, similar in style to Peter's light-weight language API.

@tautschnig
Copy link
Collaborator

If the debug formatter weren't a static object then front-ends could also configure it to produce, e.g., json-compatible output. Thus I'd still like to see this static object go away.

@kroening
Copy link
Member Author

debug formatter: but then pass down a formatter basically everywhere that might want to debug?

@tautschnig
Copy link
Collaborator

debug formatter: but then pass down a formatter basically everywhere that might want to debug?

No, you can safely leave the extern debug_formattert debug_formatter; in place and make front-ends actually create that object. Maybe that declaration should become a std::unique_ptr though. An alternative option is to make the debug_formattert configurable - leaving the statically allocated object in place, but allowing front-ends to change its behaviour.

@peterschrammel
Copy link
Member

  1. I think tools called goto-* should really operate on goto-binaries only and also only produce language-independent output.

@kroening
Copy link
Member Author

The debug_formatter is now configurable.

@tautschnig
Copy link
Collaborator

I think tools called goto-* should really operate on goto-binaries only and also only produce language-independent output.

I agree with that in principle, but that requires any such language-independent output to be properly tested and effectively become as good as expr2ct is today. It really means that it is not a debug tool anymore. Users can reasonably expect that there are no regressions visible to them when all that happens is under-the-hood cleanup.

@kroening kroening force-pushed the format-functor branch 2 times, most recently from db211d6 to d46e8e5 Compare May 18, 2018 16:05
@kroening kroening assigned kroening and tautschnig and unassigned kroening May 19, 2018
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this will eventually be valuable, but right now it still has a few issues:

  1. It is introducing even more objects of static lifetime, which seemed something that was to be gotten rid of.
  2. There is no sensitivity towards plain text/JSON/XML output, with no apparent way of making this possible.
  3. Maybe one of the commits could clarify the "why?"

return formatter->format(os, o);
}

formattert *formatter;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A raw public pointer seems very dangerous. There should be an API around it - in particular as this is, as far as I can tell, how it is made "configurable"?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will wrap.

std::ostream &format(std::ostream &, const source_locationt &) override;
};

extern default_debug_formattert default_debug_formatter;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a global object. Create it when creating the debug_formattert. Actually I'd move the entire declaration to the .cpp file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key benefit is that it can then be a default argument. I understand overloading is an alternative, but that also creates many functions.

@tautschnig tautschnig assigned kroening and unassigned tautschnig May 19, 2018
@kroening
Copy link
Member Author

Can you clarify what sensitivity towards XML/JSON output would mean?

@peterschrammel
Copy link
Member

peterschrammel commented May 19, 2018

How will we supply language-specific formatters to be used in the structured JSON/XML output (!= messages)?

@tautschnig
Copy link
Collaborator

Can you clarify what sensitivity towards XML/JSON output would mean?

  1. escaping needs to be done differently (though could be fixed up afterwards if need be).
  2. some output should be structured: an example that immediately comes to mind are source_locationt.

@tautschnig
Copy link
Collaborator

A couple of thoughts as the overall design goals and plans aren't very clear to me. Just my opinion, I would very much welcome comments/feedback/push-back:

  1. We should have an implementation that prints goto-program expressions in a what we designate as goto-program-expression syntax. First, what is a "goto-program expression?" That's any expression appearing in a goto-program, with its semantics defined by our back-ends. In many cases the semantics will be the same as those of similarly-named expressions in C or Java, but this needn't be the case. Also there are some expressions that only exist in goto-program world, and not necessarily in any of the front-end languages (one of those is "byte extract"). We may choose the use a syntax that is very similar to that of C or Java to represent goto-program expressions, and implicitly we have been doing so all along. Let's clean this up and make clear that there is a syntax for goto-program expressions. I would argue that format_expr is taking care of this, once completely done. Completeness should be defined by "for every expression supported by our back-end, there must be a well-defined syntax." Notably, any use of pretty() ought to be replaced by UNREACHABLE. Running --show-goto-functions (while making --show-goto-functions use format) on all regression tests might be a decent starting point to test this.
  2. If the above is acceptable, then there is no need for a debug_formatter or the likes as we would use format_expr as the one and only acceptable form of output for all back-end parts.
  3. Places where front-end-language specific output is to be used should be limited to a) output of front-end expressions (such as warnings or errors during parsing, type-checking or conversion), b) output of results that are to be translated back to the input language, such as counterexample traces or test suites, c) expressions in the symbol table that have not been converted (for example, the value has not necessarily been converted, and should thus be printed using language-specific syntax), and d) dump-c or any other means that very explicitly translates goto-program expressions back to the input-language world.
  4. expr2ct (and others deriving from it) should be changed to create valid C (Java, etc.) expressions in all cases. One example that comes to mind convert_constant_bool which still prints the non-C constants TRUE and FALSE. This would be perfectly fine in goto-program world, but should not exist in the input-language front-end. Support for printing, e.g., "byte_extract" would no longer be provided.
  5. The functors introduced here may be a good idea, but they lack the one bit of configurability that we actually need when producing output: awareness of the target context language, i.e., is it plain text, JSON, XML, or something else? We might want to have a distinct syntax for goto-programs (and goto-program expressions) in plain text/JSON/XML, in which case the point of creating the output needs to be aware. We may also choose not to have this, in which case all we need to watch out for is doing proper escaping before embedding strings in JSON/XML output. Note that for C/Java, there is (AFAIK) no JSON/XML syntax, and thus such a distinction is not necessary - only the escaping part needs taking care of in case we embed such output in JSON/XML.

@peterschrammel
Copy link
Member

@tautschnig, fully agree.

If we want to be really radical, we could go even one step further and define any non-debugging output (I'm talking about goto-programs, traces, etc) in a (structured) way that is independent of plain/JSON/XML and then provide a generic converter from the structured output specification to the respective formats. There is a massive potential for reducing code duplication in the various show_* classes...

@tautschnig
Copy link
Collaborator

If we want to be really radical, we could go even one step further and define any non-debugging output (I'm talking about goto-programs, traces, etc) in a (structured) way that is independent of plain/JSON/XML [...]

Yes, I have thought along those lines as well, with the following result:

  • Most of the debate around format etc is about translating to a single (possibly large) string, i.e., does not collide with this idea.
  • source_locationt are a notable exception that we don't want to turn into just a single string.
  • I think we just need to distinguish string/key-value maps/arrays; mapping those to plain text incidentally is the least obvious/trivial part.
  • Since I don't really know what the current overall plan is I haven't actually made a stab at this. Maybe someone who knows about the plan can share it?

@kroening
Copy link
Member Author

The 'radical' idea certainly has appeal; however, it's not trivial. As simplest example, JSON prefers camlCase, whereas XML uses a different convention. Plain text has further human preferences attached, which requires further special-casing.

I'd be tempted to remove that redundancy by killing XML eventually; fewer and fewer people care.

@martin-cs
Copy link
Collaborator

Although I'm far from a fan of XML, I think there were a number of commercial users for it last time I checked.

@TGWDB
Copy link
Contributor

TGWDB commented Feb 23, 2021

Closing this PR as the discussion implies that a different solution is desirable and this PR is not going to be completed/approved in current form. Unless the discussion/opinions change significantly, recommend a new PR for the new approach rather than reopen this one to continue this approach.

If you believe this has been closed erroneously please reopen.

@TGWDB TGWDB closed this Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants